DIMESS is a package for analyzing scRNA data. It allows you to try out different distance metrics easily on Seurat objects in order to cluster genes and explore gene expression networks.
We start by loading some packages. The most important is Seurat – DIMESS expects to interact with Seurat objects throughout the entire workflow.
library(Seurat)
library(readxl)
library(dplyr)
source("dimess.R")
Now that we have the necessary packages, we load data. Some of this is example specific, such as the code to attach metadata to the Seurat object. When working with your own datasets, be sure to ensure that your Seurat object has count data with the proper feature (gene) names.
The normalization procedure is basically identical to that described in the Seurat vignettes – the point of DIMESS is to augment rather than replace your existing Seurat workflow.
load("./adenoallraw.RData")
type <- "tumor"
lung <- CreateSeuratObject(adeno.all$TUMOR, min.cells = 3, min.features = 0, project = "nsclc")
metadata <- read_excel("./MetaData.xlsx", sheet = 1)
strsplit.ind2 <- seq(from = 2, by = 2, length.out = nrow(metadata))
metadata <- metadata %>% mutate(piece = unlist(strsplit(Patient_piece, split = "-"))[strsplit.ind2]) %>%
filter(CellType == type)
lung <- AddMetaData(object = lung, metadata = metadata)
lung <- NormalizeData(object = lung, normalization.method = "LogNormalize", scale.factor = 10000)
all.genes <- rownames(x = lung)
lung <- ScaleData(object = lung, features = all.genes)
Now that we have a Seurat object, let’s examine gene expression networks. First, we identify a subset of cell names that interest us. In our case, we’re looking at only patients 3 and 4, so we get the cell names of interest by subsetting the metadata.
cells <- metadata[metadata$"PatientNumber MS" %in% c(3:4), ]$cell
Next, we identify some list of metrics that interest up. DIMESS supports the 17 from the dismay package, here we use almost all of them.
metrics <- c("pearson", "spearman", "kendall", "bicor", "zi_kendall", "binomial",
"MI", "cosine", "jaccard", "canberra", "euclidean", "manhattan", "weighted_rank",
"hamming")
We then pass our cells and metrics to DIMESS’s build_gene_graphs function, which returns a list of tbl_graph items that can be passed to the other functions in the package.
graphs <- build_gene_graphs(lung, cells, metrics)
For example, we can plot some of the clusters for different metrics.
for (metric in c("pearson", "MI", "bicor")) {
plot_gene_clusters(graphs[[metric]], metric)
}
We could also plot the overall network, colored by centrality:
for (metric in c("cosine", "jaccard", "manhattan")) {
plot_network(graphs[[metric]], metric)
}
We can also find just the most central genes:
for (metric in c("euclidean", "binomial", "weighted_rank")) {
print(most_central(graphs[[metric]], 10))
}
## # A tibble: 10 x 1
## name
## <chr>
## 1 IGLC7
## 2 MUCL1
## 3 SPINK1
## 4 SCGB3A2
## 5 RARRES1
## 6 PAEP
## 7 S100A7
## 8 IGKC
## 9 S100A8
## 10 CCL20
## # A tibble: 10 x 1
## name
## <chr>
## 1 MUCL1
## 2 RARRES1
## 3 CXCL10
## 4 PAEP
## 5 PCSK1N
## 6 KRT17
## 7 S100A7
## 8 SERPINB4
## 9 SERPINB3
## 10 KRT23
## # A tibble: 10 x 1
## name
## <chr>
## 1 MUCL1
## 2 SPINK1
## 3 RARRES1
## 4 CXCL10
## 5 PAEP
## 6 S100A7
## 7 SERPINB4
## 8 AREG
## 9 CCL20
## 10 AGR2